Educational Applications of Latent Semantic Analysis

نویسندگان

  • Darrell Laham
  • Thomas K. Landauer
چکیده

LSA, a mathematical modeling technique, captures the essential relationships between text documents and word meaning, or semantics, the knowledge base which must be accessed to evaluate the quality of content. Several educational applications that employ LSA have been developed: (1) selecting the most appropriate text for learners with variable levels of background knowledge, (2) automatically grading the content of an essay, and (3) helping students effectively summarize material. Introduction Latent Semantic Analysis (LSA) is a mathematical/statistical technique for extracting and representing the similarity of meaning of words and passages by analysis of large bodies of text. It uses singular value decomposition, a general form of factor analysis, to condense a very large matrix of word-by-context data into a much smaller, but still largeÑtypically 100-500 dimensionalÑrepresentation (Deerwester, Dumais, Furnas, Landauer & Harshman, 1990). The right number of dimensions appears to be crucial; the best values yield up to four times as accurate simulation of human judgments as ordinary co-occurence measures. The similarity between resulting vectors for words and contexts, as measured by the cosine of their contained angle, has been shown to closely mimic human judgments of meaning similarity and human performance based on such similarity in a variety of ways. For example, after training on about 2,000 pages of English text it scored as well as average test-takers on the synonym portion of TOEFLÑthe ETS Test of English as a Foreign Language (Landauer & Dumais, 1997). After training on an introductory psychology textbook it achieved a passing score on a multiple-choice exam (Landauer, Foltz & Laham, in prep). LSA significantly improves automatic information retrieval by allowing user requests to find relevant text on a desired topic even when the text contains none of the words used in the query (Dumais, 1991, 1994). ¥For mathematical and computational details of the LSA method see Deerwester, et. al, 1990; Landauer & Dumais, 1997; Landauer, Foltz, & Laham, (in press). HCIC Ô98 BoasterÑEducational LSA Page 2 Text Selection ¥This section is extracted from Wolfe, Schreiner, Rehder, Laham, Foltz, Kintsch, & Landauer (in press). For additional information see Rehder, Schreiner, Wolfe, Laham, Landauer, & Kintsch (in press); Schreiner, Rehder, Landauer, & Laham (1997). This application is a result of an empirical examination of a theoretical relationship proposed by Kintsch (1994) in which the ability of a reader to learn from text is proposed to be dependent on the match between the background knowledge of the reader and the difficulty of the text information. LSA is used as a means of automatically predicting how much readers will learn from texts based on the estimated conceptual match between their knowledge of the topic and the information in the text they read. Participants in this study were given tests to assess their knowledge of the human heart and circulatory system, including questionnaires and open-ended essay questions before and after reading one of four relevant texts that ranged in difficulty from elementary (A) to medical school (D) level. Results show a nonmonotonic relationship in which learning was greatest for texts that were neither too easy nor too difficult. We call this the zone-of-learnability, or the ÒGoldilocks principleÓ. LSA proved just about as effective at predicting learning from these texts as traditional knowledge assessment measures. For these texts, optimal assessment of text to student on the basis of either pre-reading measure would have increased the amount learned significantly. Over all texts, the three measures of knowledge used hereÑthe pre-scores on the questionnaire (pre-questionnaire) and the grade on the pre-essay (pre-essay) and the cosine between the participants essay vector and Text C (cos essay.standard) were all quite highly correlated: r(pre-questionnaire : pre-essay) = .74, r(pre-questionnaire : cos essay.standard) = .68, and r(pre-essay : cos essay.standard) = .63, all p < .01. For comparison, the correlation between the two professional graders who scored the essays was r = .77. Thus, one can say that the LSA measure of knowledge is about as good as our questionnaire measures, and correlates with human graders almost as well as the human graders correlate among themselves. Amount of learning was operationally defined in two ways: as the proportion of possible improvement in the scores on the questionnaire from before to after reading (Learn-questionnaire), and as the proportion of possible improvement in the grades the student's essays received before and after reading (Learn-essay). The average cosine between the students' essays and the text they read can be used to predict the proportion improvement scores for the general knowledge test and the essay grades, Learn-questionnaire and Learn-essay. The data are shown in Figure 1 for the four groups of college students who read Texts A, B, C, and D, respectively, as well as for the medical students who read only Text A. The latter were included in this analysis because none of the texts was obviously too easy for the college students; to test the zone-of-learnability hypothesis we needed a group of learners who read a text that was clearly too easy for them. The curves fitted to the points in Figure 1 are second order polynomials. The zone-of-learnability hypothesis is supported most clearly by a plot of the average learning scores for the HCIC Ô98 BoasterÑEducational LSA Page 3 four texts as a function of the average cosine between the students' essays and the text they read. 0 0.1 0.2 0.3 0.4 0.5 Le ar ni ng S co re s

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query expansion based on relevance feedback and latent semantic analysis

Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...

متن کامل

ExpLSA: An Approach Based on Syntactic Knowledge in Order to Improve LSA for a Conceptual Classification Task

Latent Semantic Analysis (LSA) is nowadays used in various thematic like cognitive models, educational applications but also in classification. We propose in this paper to study different methods of proximity of terms based on LSA. We improve this semantic analysis with additional semantic information using Tree-tagger or a syntactic analysis to expand the studied corpus. We finally apply LSA o...

متن کامل

Comment Data Mining to Estimate Student Performance Considering Consecutive Lessons

The purpose of this study is to examine different formats of comment data to predict student performance. Having students write comment data after every lesson can reflect students’ learning attitudes, tendencies and learning activities involved with the lesson. In this research, Latent Dirichlet Allocation (LDA) and Probabilistic Latent Semantic Analysis (pLSA) are employed to predict student ...

متن کامل

Word Maturity: Computational Modeling of Word Knowledge

While computational estimation of difficulty of words in the lexicon is useful in many educational and assessment applications, the concept of scalar word difficulty and current corpus-based methods for its estimation are inadequate. We propose a new paradigm called word meaning maturity which tracks the degree of knowledge of each word at different stages of language learning. We present a com...

متن کامل

Comparison of Dimension Reduction Methods for Automated Essay Grading

Automatic Essay Assessor (AEA) is a system that utilizes information retrieval techniques such as Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), and Latent Dirichlet Allocation (LDA) for automatic essay grading. The system uses learning materials and relatively few teacher-graded essays for calibrating the scoring mechanism before grading. We performed a series o...

متن کامل

تبیین و ارزیابی مؤلفه‌های مؤثر بر کیفیت فضای دانشکده‌های معماری از دیدگاه دانشجویان مطالعه‌ای در دانشکده‌های معماری دانشگاه‌های تبریز، هنر اسلامی و آزاد تبریز

Development of the architecture profession in recent years has led to the quantitative development of architecture schools. While much attention has been given to quantitative aspects of such educational spaces, their quality has been neglected. This paper concentrates on two key questions: (1) what are the desirable and acceptable characteristics of educational spaces from the architecture...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998